An Evaluation of Synthetic Speech Using the PESQ Measure

نویسندگان

  • Milos Cernak
  • Milan Rusko
چکیده

The paper presents experiments on the use of the perceptual objective measure – ITU-T Rec. P.862 Perceptual Evaluation of Speech Quality (PESQ), for the automatic evaluation of synthetic speech. The approach is based on the evaluation of the statistically significant correlation between the outputs of subjective and objective tests. We propose the following technique to evaluate the usage of the PESQ method for synthetic speech: Firstly, a list of the test words has to be defined for the entire language. Secondly, the tested synthesizers are used to generate synthetic speech signal for all the words in the list. Synthesizer engines of different quality were used for the generation of stimuli: LP synthesizer, RELP synthesizer and PSOLA synthesizer, both in female and male versions. We evaluated created stimuli by listening tests. Thirdly, the PESQ method with original human (reference) and synthesized (measured) recordings as inputs is used to evaluate the overall quality of the synthesized signals. Finally, a correlation of the resulting MOS and objective MOS scores is calculated for each voice. Our results indicate a strong correlation between the mentioned subjective and objective evaluation of the quality of synthetic speech. We plan to use the PESQ measure in automatic evaluation of new versions of synthetic voices, without a need of subjective tests. This approach can foster the life cycle of the development of new versions of synthetic voices tremendously. Using PESQ with “original voice” as reference represents a rapid and repeatable synthetic voice quality measurement technique that provides the developer with results in a few moments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Diagnostic Evaluation of Synthetic Speech Using Speech Recognition

The paper presents experiments on the use of automatic speech recognition for diagnostic evaluation of synthetic speech. Our previous work on the topic showed a strong correlation between the subjective and objective evaluation (ITU-T Rec. P.862 PESQ) of the quality of synthetic speech. The main drawback of the approach was the need for original human (reference) recordings in one to one mappin...

متن کامل

Speech enhancement based on hidden Markov model using sparse code shrinkage

This paper presents a new hidden Markov model-based (HMM-based) speech enhancement framework based on the independent component analysis (ICA). We propose analytical procedures for training clean speech and noise models by the Baum re-estimation algorithm and present a Maximum a posterior (MAP) estimator based on Laplace-Gaussian (for clean speech and noise respectively) combination in the HMM ...

متن کامل

Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs

Previous objective speech quality assessment models, such as bark spectral distortion (BSD), the perceptual speech quality measure (PSQM), and measuring normalizing blocks (MNB), have been found to be suitable for assessing only a limited range of distortions. A new model has therefore been developed for use across a wider range of network conditions, including analogue connections, codecs, pac...

متن کامل

Objective Quality Assessment of Wideband Speech Coding using W-PESQ Measure and Artificial Voice

An objective quality measurement methodology for wideband-speech coding has been studied, its essential components being an objective quality measure and an input test signal. Wideband-PESQ conforming to draft Recommendation P.862 has been studied as the objective quality measure. The Wideband-PESQ has been verified from the viewpoint of the consistency between subjectively evaluated MOS and ob...

متن کامل

Harmonics Enhancement for Determined Blind Sources Separation using Source’s Excitation Characteristics

We present an improved method on combining temporal and spectral processing approaches for multichannel determined blind sources separation. The separation task is performed by applying the spectral processing on a mixed speech, using sources’ excitation characteristics. The performance of the proposed method is investigated by separating two sources from a stereo recording mixture extracted fr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005